Welcome to the SIB Days 2020 - virtual conference Visium Spatial Transcriptomics workshop by 10x Genomics!
The purpose of this tutorial will be to walk users through some of the steps necessary to explore data produced by the 10x Genomics Visium Spatial Gene Expression Solution and the Spaceranger pipeline. We will investigate the data sets are all freely available from 10x Genomics.
This tutorial is largely an extension off of the primary Seurat Visium Tutorial
Things to know about this workshop
/mnt/libs/shared_data/[]
library(Seurat)
library(ggplot2)
library(patchwork)
library(dplyr)
library(RColorBrewer)
Real Dataset for the tutorial
breast_cancer <- Load10X_Spatial(data.dir = "/mnt/libs/shared_data/human_breast_cancer_1/outs/",
filename = "V1_Breast_Cancer_Block_A_Section_1_filtered_feature_bc_matrix.h5")
Same data just internal to 10x
breast_cancer <- Load10X_Spatial(data.dir = "/mnt/analysis/marsoc/pipestances/HWHTFDSXX/SPATIAL_RNA_COUNTER_PD/163086/HEAD/outs/", slice = "slice1")
There are a bunch of data sets hosted by the Satija lab in the Seurat Data Package.
Let’s have a look at some basic QC information. Keep in mind that most Seurat plots are ggplot object and can be manipulated as such.
Counts = UMI Features = Genes
plot1 <- VlnPlot(breast_cancer, features = "nCount_Spatial", pt.size = 0.1) +
ggtitle("UMI") +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
legend.position = "right") +
NoLegend()
plot2 <- VlnPlot(breast_cancer, features = "nFeature_Spatial", pt.size = 0.1) +
ggtitle("Genes") +
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
legend.position = "right") +
NoLegend()
plot3 <- SpatialFeaturePlot(breast_cancer, features = "nCount_Spatial") +
theme(legend.position = "right")
plot4 <- SpatialFeaturePlot(breast_cancer, features = "nFeature_Spatial") +
theme(legend.position = "right")
plot1 + plot2 + plot3 + plot4 + plot_layout(nrow = 2, ncol = 2)
Spaceranger does normalization for clustering and DE but does not return that normalized matrix
Pre-normalization Raw UMI counts
SpatialFeaturePlot(breast_cancer, features = c("ERBB2", "CD8A"))
Don’t worry about reachediteration limit warnings. See https://github.com/ChristophH/sctransform/issues/25 for discussion
Default assay will now be set to SCT
breast_cancer <- SCTransform(breast_cancer, assay = "Spatial", verbose = FALSE)
SpatialFeaturePlot(breast_cancer, features = c("ERBB2", "CD8A"))
From Seurat:
The default parameters in Seurat emphasize the visualization of molecular data. However, you can also adjust the size of the spots (and their transparency) to improve the visualization of the histology image, by changing the following parameters:
pt.size.factor- This will scale the size of the spots. Default is 1.6
alpha - minimum and maximum transparency. Default is c(1, 1).
Try setting to alpha c(0.1, 1), to downweight the transparency of points with lower expression
p1 <- SpatialFeaturePlot(breast_cancer, features = "IGFBP5", pt.size.factor = 1)+
theme(legend.position = "right") +
ggtitle("Actual Spot Size")
p2 <- SpatialFeaturePlot(breast_cancer, features = "IGFBP5")+
theme(legend.position = "right") +
ggtitle("Scaled Spot Size")
p1 + p2
We can then proceed to run dimensionality reduction and clustering on the RNA expression data, using the same workflow as we use for scRNA-seq analysis.
Some of these processes can be parallized please see Parallelization in Seurat for more info
The default UMAP calculation is performed with the R-based UWOT library However, you can run UMAP in python via reticulate library and umap-learn. We have found that for smaller data sets (<= 10k cells/spots) UWOT is great. For much larger data sets (100k + cells/spots) umap-learn can be a faster option.
breast_cancer <- RunPCA(breast_cancer, assay = "SCT", verbose = FALSE)
breast_cancer <- FindNeighbors(breast_cancer, reduction = "pca", dims = 1:30)
Computing nearest neighbor graph
Computing SNN
breast_cancer <- FindClusters(breast_cancer, verbose = FALSE)
breast_cancer <- RunUMAP(breast_cancer, reduction = "pca", dims = 1:30)
The default method for RunUMAP has changed from calling Python UMAP via reticulate to the R-native UWOT using the cosine metric
To use Python UMAP via reticulate, set umap.method to 'umap-learn' and metric to 'correlation'
This message will be shown once per session14:26:26 UMAP embedding parameters a = 0.9922 b = 1.112
14:26:26 Read 3822 rows and found 30 numeric columns
14:26:26 Using Annoy for neighbor search, n_neighbors = 30
14:26:26 Building Annoy index with metric = cosine, n_trees = 50
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
14:26:27 Writing NN index file to temp file /tmp/RtmpTrBZqF/file780e57b16d8ce
14:26:27 Searching Annoy index using 1 thread, search_k = 3000
14:26:28 Annoy recall = 100%
14:26:28 Commencing smooth kNN distance calibration using 1 thread
14:26:29 Initializing from normalized Laplacian + noise
14:26:32 Commencing optimization for 500 epochs, with 157848 positive edges
0% 10 20 30 40 50 60 70 80 90 100%
[----|----|----|----|----|----|----|----|----|----|
**************************************************|
14:26:42 Optimization finished
Now let’s have a look at the clustering
DimPlot(breast_cancer, reduction = "umap", label = FALSE,) +
labs(color = "Cluster")
p1 <- DimPlot(breast_cancer, reduction = "umap", label = TRUE) +
labs(color = "Cluster")
p2 <- SpatialDimPlot(breast_cancer, label = TRUE, label.size = 3) +
labs(fill = "Cluster")
p1 + p2 + plot_annotation(
title = 'Clustering in UMAP and Tissue Space',
caption = 'Processed by Spaceranger 1.1\nNormilization and Clustering by Seurat'
) + plot_layout(nrow = 1)
I don’t really like these colors so let’s change them
myPalette <- colorRampPalette(rev(brewer.pal(11, "Spectral")))
p1 <- DimPlot(breast_cancer, reduction = "umap", label = TRUE) +
labs(color = "Cluster") +
scale_color_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "pink", "purple", "brown",
"grey", "yellow", "green"))
p2 <- SpatialDimPlot(breast_cancer, label = TRUE, label.size = 3) +
labs(fill = "Cluster")+
scale_fill_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "pink", "purple", "brown",
"grey", "yellow", "green"))
p1 + p2 + plot_annotation(
title = 'Clustering in UMAP and Tissue Space',
caption = 'Processed by Spaceranger 1.1\nNormilization and Clustering by Seurat'
) + plot_layout(nrow = 1)
If interested you can also now look at UMI and Gene counts per cluster as well
plot1 <- VlnPlot(breast_cancer, features = "nCount_Spatial", pt.size = 0.1) +
ggtitle("UMI") +
scale_fill_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "pink", "purple", "brown",
"grey", "yellow", "green"))+
theme(axis.text.x = element_blank(),
axis.title.x = element_blank(),
legend.position = "right") +
NoLegend()
plot2 <- VlnPlot(breast_cancer, features = "nFeature_Spatial", pt.size = 0.1) +
ggtitle("Genes") +
scale_fill_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "pink", "purple", "brown",
"grey", "yellow", "green"))+
theme(axis.title.x = element_blank(),
legend.position = "right") +
NoLegend()
plot1 + plot2
Now let’s take a look at at a gene of interest with violin plots but also in image space.
p1 <- VlnPlot(breast_cancer, features = "IGFBP5", pt.size = 0.1) +
ggtitle("IGFBP5") +
scale_fill_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "pink", "purple", "brown",
"grey", "yellow", "green"))+
theme(axis.title.x = element_blank(),
legend.position = "right") +
NoLegend() +
stat_summary(fun=mean, geom="point", shape=23, size=4, color="red")
p2 <- SpatialFeaturePlot(breast_cancer, features = "IGFBP5")+
theme(legend.position = "right")
CombinePlots is being deprecated. Plots should now be combined using the patchwork system.
p3 <- SpatialDimPlot(breast_cancer, label = TRUE, label.size = 3) +
labs(fill = "Cluster")+
scale_fill_manual(values = c("#b2df8a","#e41a1c","#377eb8","#4daf4a","#ff7f00","gold",
"#a65628", "#999999", "black", "pink", "purple", "brown",
"grey", "yellow", "green")) +
NoLegend()
CombinePlots is being deprecated. Plots should now be combined using the patchwork system.
row1 <- p2 + p3 + plot_layout(nrow = 1)
row1 + p1+ plot_layout(nrow = 2, widths = c(0.5, 0.5))
We can also look at these data interactively
LinkedDimPlot(breast_cancer)
First we’ll identify deferentially expressed genes (parallelization can help here).
Let’s find all the markers for every cluster
de_markers_up <- de_markers %>%
tibble::rownames_to_column(var = "Gene") %>%
arrange(-avg_logFC)
de_markers_down <- de_markers %>%
tibble::rownames_to_column(var = "Gene") %>%
arrange(avg_logFC)
SpatialFeaturePlot(object = breast_cancer, features = de_markers_up$Gene[1:10], alpha = c(0.1, 1), ncol = 3)
SpatialFeaturePlot(object = breast_cancer, features = de_markers_down$Gene[1:10], alpha = c(0.1, 1), ncol = 3)
what are the top variable features?
VariableFeatures(breast_cancer)[1:10]
[1] "IGKC" "IGHG3" "IGLC2" "MGP" "ALB" "IGHG1" "CPB1" "CRISP3" "IGLC3" "IGHM"
what are the top DE genes?
rownames(de_markers)[1:10]
[1] "CRISP3" "IGFBP5" "SLITRK6" "H3F3A" "IGFBP4" "MALAT1" "C3" "KRT18" "S100A11" "MGP"
So what about spatial enrichment?
Some methods
Using the top 100 variable genes find spatially enriched ones. Note that in the Seurat Spatial Tutorial they use 1000 genes. You can also use all genes but that will take a long time. Using a calculation of Morans I can sometimes be a faster approach, especially if you are using parallization.
breast_cancer <- FindSpatiallyVariableFeatures(breast_cancer,
assay = "SCT",
slot = "scale.data",
features = VariableFeatures(breast_cancer)[1:100],
selection.method = "markvariogram", verbose = TRUE)
Have a look at the spatially variable genes calculated by markvariogram ordered from most variable to least variable
SpatiallyVariableFeatures(breast_cancer, selection.method = "markvariogram", decreasing = TRUE)
[1] "CRISP3" "CXCL14" "MGP" "CPB1" "COX6C" "SLITRK6" "TTLL12" "CCND1" "MALAT1"
[10] "AGR2" "ALB" "GFRA1" "S100G" "CSTA" "DEGS1" "TFF3" "MT-ND1" "IGLC2"
[19] "MT-CO1" "IGHG3" "CD74" "C6orf141" "S100A6" "MT-ND2" "TFF1" "IGKC" "IGHG1"
[28] "APOE" "ZNF350-AS1" "HLA-DRA" "AC087379.2" "IGHG4" "C3" "FCGR3B" "TIMP1" "LINC00645"
[37] "IGHM" "SCGB2A2" "KRT14" "IGLC3" "KRT17" "LYZ" "APOC1" "SCGB1D2" "IGHA1"
[46] "C1QA" "AEBP1" "APOD" "KRT5" "MMP7" "CCL19" "COL6A2" "TAGLN" "S100A9"
[55] "IGHG2" "COL1A2" "DCN" "SPP1" "COL1A1" "CGA" "VIM" "IGFBP7" "FN1"
[64] "CCDC80" "CXCL9" "IGHA2" "TRBC2" "SFRP2" "KRT6B" "S100A2" "LUM" "COL3A1"
[73] "IGLC7" "SAA1" "CARTPT" "COMP" "JCHAIN" "CST1" "PTGDS" "SFRP4" "CD79A"
[82] "CCL21" "FABP4" "MUC19" "ACKR1" "POSTN" "MMP9" "S100A7" "VWF" "AQP1"
[91] "CLDN5" "ACTA2" "MS4A1" "IGLL5" "MYH11" "CXCL10" "IGHD" "HBB" "TPSB2"
[100] "AC005224.3"
top.features_trendseq <- head(SpatiallyVariableFeatures(breast_cancer, selection.method = "markvariogram"), 8)
SpatialFeaturePlot(breast_cancer, features = top.features_trendseq, ncol = 4, alpha = c(0.1, 1))
Moran’s I implementation. For other spatial data types the x.cuts and y.cuts determines the grid that is laid over the tissue in the capture area. Here we’ll remove those
breast_cancer <- FindSpatiallyVariableFeatures(breast_cancer,
assay = "SCT",
slot = "scale.data",
features = VariableFeatures(breast_cancer)[1:100],
selection.method = "moransi")
Computing Moran's I
Have a look at the spatially variable genes calculated by moransi ordered from most variable to least variable
SpatiallyVariableFeatures(breast_cancer, selection.method = "moransi", decreasing = TRUE)
[1] "CRISP3" "CXCL14" "TTLL12" "SLITRK6" "GFRA1" "CCND1" "AGR2" "COX6C" "MGP"
[10] "ALB" "MALAT1" "CPB1" "DEGS1" "C6orf141" "CSTA" "MT-ND1" "TFF3" "MT-CO1"
[19] "S100A6" "MT-ND2" "LINC00645" "FCGR3B" "S100G" "TFF1" "C3" "ZNF350-AS1" "HLA-DRA"
[28] "CD74" "SCGB1D2" "APOD" "IGLC2" "TIMP1" "IGHG3" "SCGB2A2" "APOC1" "KRT14"
[37] "LYZ" "CCDC80" "APOE" "TAGLN" "IGHG1" "AC087379.2" "CCL19" "SPP1" "IGKC"
[46] "S100A9" "KRT17" "IGHM" "FN1" "KRT5" "IGFBP7" "C1QA" "COL6A2" "AQP1"
[55] "CXCL9" "ACKR1" "CARTPT" "DCN" "AEBP1" "IGLC3" "IGHG4" "VIM" "S100A2"
[64] "IGHG2" "MMP7" "IGHA1" "CCL21" "KRT6B" "TRBC2" "COL1A2" "CGA" "SFRP2"
[73] "VWF" "COL1A1" "SAA1" "MUC19" "COL3A1" "CLDN5" "SFRP4" "POSTN" "PTGDS"
[82] "IGHA2" "JCHAIN" "CD79A" "ACTA2" "LUM" "CST1" "S100A7" "COMP" "MS4A1"
[91] "IGLC7" "MMP9" "HBB" "FABP4" "MYH11" "IGLL5" "CXCL10" "TPSB2" "IGHD"
[100] "AC005224.3"
top.features_moransi <- head(SpatiallyVariableFeatures(breast_cancer, selection.method = "moransi"), 8)
SpatialFeaturePlot(breast_cancer, features = top.features_moransi, ncol = 4, alpha = c(0.1, 1))
We can see that the results are slightly different. So let’s take a look at why.
spatially_variable_genes <- breast_cancer@assays$SCT@meta.features %>%
tidyr::drop_na()
spatially_variable_genes
You can see the two methods show
mm_cor <- cor.test(spatially_variable_genes$moransi.spatially.variable.rank, spatially_variable_genes$markvariogram.spatially.variable.rank)
ggplot(spatially_variable_genes, aes(x=moransi.spatially.variable.rank,y=markvariogram.spatially.variable.rank))+
geom_point()+
geom_smooth()+
xlab("Morans I Rank")+
ylab("Markvariogram Rank")+
annotate("text", x = 25, y = 75, label = paste("Pearson's Correlation\n", round(mm_cor$estimate[1], digits = 2), sep = ""))+
theme_bw()
Where are these genes being expressed relative to pathologist annotation?
ca <- readbitmap::read.bitmap("images/Breast Cancer Path.png")
# in the tutorial
# ca <- readbitmap::read.bitmap('/mnt/libs/shared_data/human_breast_cancer_1/images/Breast_Cancer_Path.png')
plot(0:1,0:1,type="n",ann=FALSE,axes=FALSE)
rasterImage(ca,0,0,1,1)
Looks like the Matrix Gla protein ( MGP ) gene is enriched in Ductal Carcinoma In Situ. Not a lot is known about MGP in the context of cancer but it looks like it could be an interesting novel gene to investigate with regard to Ductal Carcinoma In Situ.
SpatialFeaturePlot(object = breast_cancer, features = "MGP", alpha = c(0.1, 1), ncol = 3)
for workshop
breast_cancer_2 <- Load10X_Spatial(data.dir = "/mnt/libs/shared_data/human_breast_cancer_2/outs/",
filename = "V1_Breast_Cancer_Block_A_Section_2_filtered_feature_bc_matrix.h5")
for 10x
breast_cancer_2 <- Load10X_Spatial(data.dir = "/mnt/analysis/marsoc/pipestances/HWHTFDSXX/SPATIAL_RNA_COUNTER_PD/163087/HEAD/outs/",
slice = "slice2")
breast_cancer_merged <- merge(breast_cancer, breast_cancer_2)
Some cell names are duplicated across objects provided. Renaming to enforce unique cell names.
Counts = UMI Features = Genes
plot1 <- SpatialFeaturePlot(breast_cancer_merged, features = "nCount_Spatial") +
theme(legend.position = "right")
plot2 <- SpatialFeaturePlot(breast_cancer_merged, features = "nFeature_Spatial") +
theme(legend.position = "right")
plot1 + plot2 + plot_layout(nrow = 2)
This will take ~3min
Default assay will now be set to SCT
breast_cancer_merged <- SCTransform(breast_cancer_merged, assay = "Spatial", verbose = TRUE)
SpatialFeaturePlot(breast_cancer_merged, features = c("ERBB2", "CD8A"))
Here we have a preprocessed Seurat object with 10k nuclei annotated from a breast cancer sample. Don’t bother too much with the details of how this data was generated they don’t particularly matter for our purposes.
10x infrastructure
bc_snRNA <- readRDS("/mnt/home/stephen.williams/yard/Odin/SIB_2020_Workshop/bc_snRNA.rds")
For the workshop
bc_snRNA <- readRDS("/mnt/libs/shared_data/bc_snRNA.rds")
bc_snRNA
An object of class Seurat
56988 features across 10000 samples within 2 assays
Active assay: SCT (23450 features)
1 other assay present: RNA
3 dimensional reductions calculated: pca, harmony, umap
It’s always a good idea to rerun normalization to make sure your data is in the correct format before moving forward with integration. We’ve already preprocessed this dataset.
bc_snRNA <- SCTransform(bc_snRNA, ncells = 3000, verbose = FALSE) %>%
RunPCA(verbose = FALSE) %>%
RunUMAP(dims = 1:30)
Class
Subclass
What genes define some cell types?
# Find markers for every cluster compared to all remaining cells, report only the positive ones
# This takes a bit of time so we'll skip it and move on to specific cell types
de_markers_snRNA <- FindAllMarkers(bc_snRNA, only.pos = TRUE, min.pct = 0.25, logfc.threshold = 0.25)
de_markers_snRNA %>%
group_by(cluster) %>%
top_n(n = 2, wt = avg_logFC)
Find markers that define tumors
Find markers that define tumors
Find markers that define stem cells
anchors <- FindTransferAnchors(reference = bc_snRNA, query = breast_cancer,normalization.method = "SCT")
predictions.assay <- TransferData(anchorset = anchors, refdata = bc_snRNA$subclass, prediction.assay = TRUE,
weight.reduction = breast_cancer[["pca"]])
breast_cancer[["predictions"]] <- predictions.assay
DefaultAssay(breast_cancer) <- "predictions"
Let’s have a look at our annotations again.
ca <- readbitmap::read.bitmap("images/Breast Cancer Path.png")
# in the tutorial
# ca <- readbitmap::read.bitmap('/mnt/libs/shared_data/human_breast_cancer_1/images/Breast_Cancer_Path.png')
plot(0:1,0:1,type="n",ann=FALSE,axes=FALSE)
rasterImage(ca,0,0,1,1)
NOTE we’ll work with subclass but if we wanted to work with class do……..
What about the immune microenvironment?
Ducal Carcinoma in situ is depleted of T-cells
(SpatialFeaturePlot(breast_cancer,
features = c("T cells-0"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE) |
SpatialFeaturePlot(breast_cancer,
features = c("T cells-1"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE)) /
(SpatialFeaturePlot(breast_cancer,
features = c("T cells-2"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE) |
SpatialFeaturePlot(breast_cancer,
features = c("T cells-5"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE))
B cells are enriched in the fibrous tissue outside the tumor
SpatialFeaturePlot(breast_cancer,
features = c("B cells"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE)
There seem to be some ductal cells but have a look at our score. Are we confident in this assertion?
SpatialFeaturePlot(breast_cancer,
features = c("Ductal cells"),
pt.size.factor = 1.5, crop = TRUE)
It looks like the ducal carcinoma in situ is enriched for tumor subtypes 8, 10, and 12 but not 3.
(SpatialFeaturePlot(breast_cancer,
features = c("Tumor cells-3"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE) |
SpatialFeaturePlot(breast_cancer,
features = c("Tumor cells-8"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE)) /
(SpatialFeaturePlot(breast_cancer,
features = c("Tumor cells-10"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE) |
SpatialFeaturePlot(breast_cancer,
features = c("Tumor cells-12"),
pt.size.factor = 1.5, ncol = 2, crop = TRUE))
Like the Ductal Cells, we might not be as confident in the Tumor Stem Cells but this might make sense considering the 10x snRNA dataset and the Visium dataset are from different individuals.
SpatialFeaturePlot(breast_cancer,
features = c("CD49f-hi MaSCs"),
pt.size.factor = 1.5,crop = TRUE) +
plot_annotation(
title = 'Tumor Stem Cells')
10x Genomics, patrick.roelli@10xgenomics.com ↩︎
10x Genomics, stefania.giacomello@10xgenomics.com↩︎
10x Genomics, stephen.williams@10xgenomics.com↩︎